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Abstract 



We find a countable partition P on a Lebesgue space, labeled 
{1,2,3...}, for any non-periodic measure preserving transformation T 
such that P generates T and for the T, P process, if you see an n on 
time —1 then you only have to look at times — n, 1 — n, ... — 1 to know 
the positive integer i to put at time 0. We alter that proof to extend 
every non-periodic T to a uniform martingale (i.e. continuous g func- 
tion) on an infinite alphabet. If T has positive entropy and the weak 
Pinsker property, this extension can be made to be an isomorphism. 
We pose remaining questions on uniform martingales. In the process 
of proving the uniform martingale result we make a complete analysis 
of Rokhlin towers which is of interest in and of itself. We also give an 
example that looks something like an i.i.d. process on 7? when you 
read from right to left but where each column determines the next if 
you read left to right. 
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1 Introduction 



The purpose of this paper is to show that the intuition one might have for 
finite state stationary processes gets destroyed when one considers infinite 
state stationary processes. The proof of the primary theorem we will use to 
accomplish this destruction of your intuition can be altered to get a theorem 
about uniform martingales (also called continuous g functions). This still 
leaves open questions we have been interested in regarding uniform martin- 
gales which we will pose, but we will attempt to suggest the basic technique 
that can be used to resolve these questions. In the process of getting the uni- 
form martingale result we will need a tool involving Rokhlin towers. Since 
we need this tool anyway we have decided to clean up the theory of Rokhlin 
towers in this paper. 

Comment 1. You can't accomplish this destruction of your intuition if you 
insist on looking at partitions with finite entropy. Just about everything 
which is true of finite partitions is also true of infinite partitions with finite 
entropy. Hence by necessity all infinite partitions in this paper have infinite 
entropy. 

Countable and uncountable partitions can be strange. For processes gen- 
erated by a finite alphabet, one definition of zero entropy is that a process 
has zero entropy iff the past determines the future. Parry [11] Theorem 8.2 
page 90 showed that every transformation has a countable generator such 
that the past determines the future (For the present paper generator means 
that the whole process (past and future) generates but Parry uses the word 
generator to mean that the past generates). Here we improve on that result 
with Theorem 11.11 below: 

Definition 1. Throughout this paper T is a one to one measure preserving 
transformation on a Lebesgue space fl with probability measure P such that 
P{{u) : T*(w) = u for any i}) = 0. This is what we mean by a non-periodic 
transformation. 

Comment 2. If T is one to one and measure preserving on a Lebesgue space 
then is also (measurability of is standard theory but not obvious) 
and if T is non-periodic then is also. 

Definition 2. When T is a measure preserving transformation and P is a 
partition, the P, T process is the process in which every w G f2 is assigned 
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to a doubly infinite word ...X_2, Xq, Xi, X2, ... where Xj is the element 
of P containing T'^{uj). The word it maps u to is called its T, P name. The 
T, P process is the process of T, P names endowed with the measure induced 
by the map from points to T, P names. Since T is measure preserving it is a 
stationary process. 

Comment 3. Let T be a measure preserving transformation P a partition 
and ...X_2, X_i, Xq, Xi, X2, ... be the P,T process, then if the P,T name of 
u is ...a_2, ct-i, ao, cti, 02, then the P, T name of T(w) is ...6_2, bo, bi, 62, •• 
defined by bi := Oj+i for all i, so the map from u to its doubly infinite name 
is a homomorphism onto the doubly infinite sequences of elements of P with 
induced measure where the transformation on doubly infinite sequences shifts 
a word to the left. If the T, P process separates points (i.e. if there is a set 
Z of measure such that if ui 7^ U2 are not in Z they have different T, P 
names) then this map is an isomorphism. 

Definition 3. If the P, T process separates points we say that P generates T 
which implies that T is isomorphic to the standard shift on the T, P process. 
In that case by abuse of notation we will say that T is isomorphic to the T, P 
process. 

Here is the main theorem of the paper. 

Theorem 1.1. Every non-periodic transformation has a countable generator, 
with pieces labeled {1,2,3,...}, such that, letting h{uj) he the piece of the 
partition where u is, 

(*) if h{T-\uj)) = n, then /i(r-"(w)), h{T^-''{u)), /i(T2-"(u;)), .../i(T-i(u;)) 
determines h{u). 

Comment 4. This says that every non-periodic measure preserving trans- 
formation is isomorphic to a stationary process on the integers such that if 
there is an n in position —1 then the —1, —2, .... — n terms determine the 
term. 

Comment 5. Theorem 11.11 as in virtually all theorems and definitions in 
ergodic theory, allows an exceptional set of measure 0. From here on we 
will not mention that a set of measure must be removed when we make a 
theorem, definition or any other statement. The reader should assume it. 

We will accomplish Theorem 11.11 in two ways: 1) in such a way that the 
future says very little about the past and 2) in such a way that the inverse 
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of this process also obeys (*) (which imphes that the future determines the 
past and furthermore does so very finitistically.) In the process of proving 
this theorem we prove a very strong version of the Rokhhn tower theorem. 

Definition 4. B is said to be the base of a Rokhhn tower of height H if 
B,T{B),T'^{B)...T^~^{B) are disjoint. In that case the collection 
B,T{B)...T^~\B) is called a Rokhlin tower of size H and the complement 
of the union of the tower is called the error set of the tower. 

Definition 5. We will call a Rokhlin tower an Alpern tower if for any u in 

error set of the Rokhlin tower, T{uj) is in the base of the tower. 

Comment 6. {B is the base of an Alpern tower of height H) iff 
{B,T{B)...T^~\B) are disjoint and 5, T(5)...T^(5) cover the space) iff (it 
is a Rokhlin tower of height H and it is impossible for both u and T{u) to 
be in the error set.) 

Alpern towers were generalized to two dimensions in both [13] and [17] 
and to flows by Rudolph [15j. 

Definition 6. A sequence of two or more Rokhlin towers have nested bases 
(resp. nested error sets) if the base (resp. error set) of each (except the last 
term of the sequence when the sequence is finite) contains the base (resp. 
error set) of the next. 

Comment 7. We will need to use a sequence of Alpern towers in both Theo- 
rem [HT] and a theorem we will develop later in this introduction, Theorem 1 1.71 
(below) but in the proof of Theorem 1 1.71 we will also need them to have nested 
bases so we will need to establish the following. 

Lemma 1.2. Let T be a measure preserving non-periodic transformation. 
For sufficiently rapidly increasing ui, there are sets Bi such that 

1 ) Bi^i C Bi for all i 

2) Bi,T{Bi),T'^{Bi)...T''--\Bi)) are disjoint 

3) Bi,T{Bi),T\B-)...T''-{Bi) cover the space 

Comment 8. When we say that rii are rapidly increasing we mean more 
precisely that 

1) nj+i > 2nf + 2 for all i and 

2) E,=i(^?M+i) < oo 
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Comment 9. This says that for sufficiently rapidly increasing rii we can get 
a sequence of Alpern towers of height with nested bases. 

Steve Alpern established (2) and (3) of Lemma 11.21 and the establish- 
ment of (1), (2) of Lemma 11.21 and that the union of the sets in (2) cover 
almost the full space is an old well know result. Here we just combine the 
techniques of these two proofs together to get all three at once. 

We will indicate in a comment that by tightening (1) of Comment |8]we 
also get all your dreams to come true; a sequence of Alpern towers with nested 
bases, nested error sets, and error sets whose size is a rapidly decreasing 
fraction of the size of the bases. This would appear to completely finish 
the theory of Rokhlin towers with the ultimate theorem if Lehrer and Weiss 
[8] had not opened up a can of worms by asking what happens when you 
insist that you restrict to a subset of a prechosen set of positive measure. 
We figured that since we have to prove Lemma 11.21 anyway we might as 
well complete the theory of Rokhlin towers by finishing what Lehrer and 
Weiss started (although some future author might decide that this paper 
does not completely finish the study of Rokhlin towers and might find more 
to say about them.) However, regarding our study of Rokhhn towers, only 
Lemma 11.21 is actually used in the rest of this paper and only in the proof of 
Theorem 11.71 

What Lehrer and Weiss show is that under very weak preconditions, for 
any set of less than full measure there is a Rokhlin tower with any height you 
wish covering it (except measure 0). In particular you can cover one Rokhlin 
tower (together with a piece of its error set) with another. This allows you 
to arrange that a prechosen tower be the first of a sequence of towers with 
nested error sets and arbitrarily small error sets. In fact since you can choose 
the height before you choose the set you are covering, not only can the error 
sets be made to be small but in fact you can make their measure to be a 
small fraction of the measure of the base of the tower. What if you want 
the prechosen tower to be the first of two towers with nested bases instead 
of nested error sets and still get the error set to be small? Lemma 11.21 does 
not allow you to let the first tower be a prechosen tower. Here the answer is 
so easy we will present the proof inside the introduction. 

Lemma 1.3. Let T be a nonperiodic measure preserving transformation and 
let S be a set of positive measure which intersects every orbit under T (i.e. 
for every u there is a positive i such that T\u) G S). Then there is a subset 
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of S which is the base of a Rokhlin tower with arbitrarily small (but cannot 
be forced have measure 0) error set. 

Comment 10. The precondition is satisfied for any set S of positive measure 
if T is ergodic. 

Proof of Lemma [L3l Let /(w) be the least i such that T'-^u) & S, e > 
0, M be chosen so large that (the probability that /(w) > M) < e, > 
M/e, P be the partition consisting of S and the complement of S, B be 
the base of a Rokhlin tower of height N error set smaller than e which 
is independent of the partition which breaks the space into names for 
the P, T process (a standard theorem allows you to choose such a i?), and 
then break the the Rokhlin tower into P columns (i.e. call two points in B 
equivalent if they have the same P, T name of length and consider the 
columns above the equivalence classes). A rung of such a column is entirely 
inside one element of P. Consider a column c. Call the column good if a 
point u in its base obeys f{uj) < M. If c is good let g{c) be the first rung 
of it in S. Then |J g{c) is the base of a Rohlin tower with height N — M 

c good 

and error set smaller than 3e . □ 

Comment 11. Letting the 5* of Lemma 11.31 be the base of a prechosen 
Rokhlin tower gives that prechosen tower to be the first of a sequence of two 
towers with nested bases such that the error set of the second is as small 
as you like. This would at first glance make you happy. At second glance 
you would get upset when you realize that in this proof you have to get the 
height of the second tower to be big in order to get the error set small. One 
would want better than that. You would want the size of the error set to be 
able to be a small fraction of the size of the base. Sorry. We have bad news. 

Theorem 1.4. Select a doubly infinite sequence of heads and tails from a 
fair coin and let /i be the resulting measure on doubly infinite sequences of 
heads and tails. Let A be the event that there is a head at the origin. We 
will show There does not exist B G A such that B is the base of a Rokhlin 
tower whose error set has measure less than /i(P)/6 

(See acknowledgements) 

What about the possibility that we can get something like Lemma [1.31 to 
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hold for Alpern towers? Again bad news. We will show 

COUNTEREXAMPLE: For every integer n > 3, there exists a Bernoulli 
transformation T and an Alpern tower of height n whose error set 
has any given size less than l/(n + 1) and whose base BB does not 
contain the base of any Alpern tower of height > n{n — 1) + \n/2\ (1) 

Comment 12. Actually T can be chosen before the n and indeed T can be 
chosen to be the standard 1/2, 1/2 independent process. The point is that 
the T we actually use has entropy less then that of the 1/2, 1/2 independent 
process so we can extend it to the 1/2, 1/2 independent process. This causes 
our set BB to have more subsets but adding more subsets to BB does not 
harm the proof that the above counterexample works and nor does it harm 
the proofs of our corollaries. 

Corollary 1.5. Let T he the standard 1/2,1/2 Bernoulli. For every n > 3 
there exists an Alpern tower of height between n and n{n — 1) + \n/2\ which 
is not the first of any sequence of two Alpern towers with nested bases except 
if the second tower is equal to the first up to measure 0. 

Proof. Select the Alpern tower of counterexample Equation ([1]) and that 
equation implies G = {m : m is the height of an Alpern tower whose base 
is a subset of BB}{lt is not empty because BB itself is such a subset so 
n G O) has a maximum mm < n{n — 1) + [7^/2j. Select a subset B of BB 
which is the base of such a tower of height mm. We claim that the Alpern 
tower with base B (Henceforth to be called the first base and first tower) 
serves as the desired tower. Suppose B is the first of two Rokhlin towers 
with nested bases, the second having base BBB. If the height of the second 
tower is only 1 it cannot be Alpern because the base is too small. Suppose 
there is an G -B \ BBB. Since the height of both towers exceeds 1 and 
since the height of the second tower does not exceed the height of the first 
tower because mm is the maximum, it is easy to see that both u and T{u) 
fail to be in the second tower so if i? \ BBB has positive measure then the 
second tower is not Alpern. □ 

Going back to arbitrary Rokhlin towers. Since Leher and Weiss get an 
arbitrary tower to be the first of a sequence of two with nested errors such 
that the second has small error and Lemma 11.31 gets it to be the first of two 
with nested bases such that the second has small error we can hope that we 
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can get it to be the first with both nested bases and nested errors with small 
error for the second. Bad news. 

Corollary 1.6. Let T be the standard 1/2, 1/2 Bernoulli transformation. 
There is a Rokhlin tower of arbitrarily large height which is not the first of 
any sequence of two Rokhlin towers with nested bases and nested error sets 
unless the second is the same as the first except for measure 0. 

Proof. The Alpern tower of Corollary 11.51 serves as our example. In fact that 
corollary explicitly says Corollary 1 1 . 6 1 once one makes the obvious observation 
that if a sequence of two Rokhlin towers has nested error sets, then if the 
first is Alpern the second is also Alpern. □ 

Here are some perverted examples when the alphabet is uncountable. 
Jonathan King provided a stationary process ...X_2, X_i, Xq, Xi, X2, ... 
where each Xj is uniformly distributed on the unit interval and if you list a 
subsequence Xa^, Xa-,^, Xa2, ... in which aj+i > + 1 for every i, the subse- 
quence is i.i.d. but where Xq and Xi determine all Xj. Rokhlin [14j noticed 
that any process has an uncountable generator where the conditional entropy 
of the present given the future is the full entropy of the process (in fact read- 
ing backwards in time we get a Markov chain) and yet each term determines 
the next. This is an easy example so we will present it in this paper. Then in 
this paper we will introduce a particularly perverted example of this. Here 
we give an easy example of a Z action on an uncountable state space which 
looks somewhat like a action on Os and Is consisting of i.i.d. indepen- 
dent random variables taking one value with probability 1/3 and the other 
with probability 2/3 if you read from right to left but where one column 
determines the next if you read from left to right. 

Definition 7. We say that a process (in general not stationary) 

...X_2, X_i, Xq, Xi, X2, ... is similar to an i.i.d. 1/3, 2/3 process (in this paper 

we will simply refer to this as a blue process) if the X^ are independent and 

each Xi takes on one of the following distributions; 1 with probability 1/3 

and with probability 2/3, or with probability 1/3 and 1 with probability 

2/3. 

Example 1. The example we introduce in this paper is of a set of ran- 
dom variables Xij taking values in {0, 1}, for all integers i,j, such that the 
columns form a stationary process (a one dimensional stationary process on 
an uncountable state space) and each column determines the next, but that 
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reading from right to left (i.e. if you invert the Z action), the process is 
a Markov chain where conditioned on a given column the next column is 
similar to an i.i.d. 1/3,2/3 process. 

This is an easy example so perhaps the reader may want to try to come 
up with it himself before reading this paper. 

Definition 8. A Uniform Martingale (also called a continuous g function) 
is a stationary process where the convergence of 

the probability of a given letter occurring at time given the n past 

to 

the probability of that letter occurring at time given the entire past 

is actually uniform convergence (i.e. where you only have to know n to know 
how close you are regardless of what past you are talking about). 

Comment 13. This is equivalent to saying that the function taking the 
past to the probability of a given letter occurring at time given the past 
is a continuous function of the past when the past is endowed with product 
topology. For this reason the phrase "continuous g function" is used to define 
the probability of the present given the past. 

Uniform Martingales have been studied in many papers including [1], [S], 
13 , [H], [12] • In [1], and |S] it was shown respectively that the T, trans- 
formation and any zero entropy transformation can be extended to a uniform 
martingale on a finite state space (where the extension takes the form of the 
composition of three homomorphisms TxB^U-^S^T where T is 
the transformation being extended, S is the uniform martingale and the ex- 
istence of the homomorphism 5* — ?■ T is precisely what is meant by saying 
that S extends T. 5 is a Bernoulli transformation with arbitrarily entropy 
(a specific Bernoulli transformation was used but when reading those papers 
it will be obvious that any Bernoulli transformation could have been used). 
In both of these papers a similar technique is used and the alphabets con- 
sidered were finite. The technique involved noting that many pieces of past 
separately told you the present. We will repeat this technique in this paper. 
The construction we make for Theorem 11.11 does not cause many pieces of 
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past to tell you the present but we will modify it so that it does. Then we 
will use the same T x B ^ U S ^ T proof to extend to a uniform martin- 
gale on a countable state space. Actually, although we can carry out such a 
proof and get S as the extension (and we will) in fact we could use U as our 
desired extension since the only reason for dropping from f/ to 5* in |4j , and 
[5] was to get a finite alphabet but here S ends up with a countably infinite 
alphabet anyway so we might as well use U. Our motivation for dropping 
to S in this paper is that perhaps this can help the reader to solve some of 
our open problems. Let us be more specific about what we will prove. It is 
easy to see that Definition [8] is equivalent to the following for finite alphabet 
processes. 

Definition 9. A stationary process is a uniform martingale iff the measure 
on the present given the n past converges uniformly as n approaches oo on 
all pasts in the variation metric. 

However the Definition [S] and Definition M are not equivalent when you 
pass to a countable alphabet (Definition [H] is stronger). 

Henceforth for both finite and countably infinite alphabets we use 
Definition [9] rather than Definition [8] as the definition of uniform 
martingale. 

This makes the following theorem as strong as possible. 

Theorem 1.7. a) Every non-periodic transformation can be extended to a 
uniform martingale on a countable state space. 

b) In fact, if the transformation can be written as the product of another 
transformation and a nontrivial Bernoulli process, (e.g. if the transformation 
has positive entropy and obeys the weak Pinsker condition) then this extension 
can be made to be an isomorphism. 

The reason that we bring up this topic in the current paper is that it is 
almost an application of Theorem ll.lt but not precisely. We have to alter the 
proof of Theorem 1 1.1 1 but we use the same basic technique of proof. However, 
here we need Alpern towers with nested bases. 

Obviously it would be nice if we could reduce this to a uniform martingale 
on a finite alphabet. Here are questions we hope someone will be able to solve. 

Question 1: Can every transformation with finite alphabet be extended to a 
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uniform martingale with finite alphabet? If a countable alphabet is needed, 
can the measure on the alphabet be made to have finite entropy? 

Question 2: Is every positive entropy transformation with finite alphabet 
isomorphic to a uniform martingale with finite alphabet? (It is easy to see 
that this is impossible for an aperiodic zero entropy process to be a uni- 
form martingale on a finite alphabet). What about the special case where 
the transformation can be written as a product of a Bernoulli and another 
transformation? 

Question 3: Is every transformation isomorphic to a uniform martingale on 
a countable alphabet even if it does not obey weak Pinsker? (It is not known 
whether or not there is a transformation that does not obey weak Pinsker) 

Question 4: Suppose there is a uniform martingale on a 3 letter alphabet 
with entropy less than log(2). Is it isomorphic to a uniform martingale on a 
2 letter alphabet? Can it at least be extended to a uniform martingale on a 
2 letter alphabet? 

2 Acknowledgements 

Our proof of Theorem 11.41 was simplified by Paul Balister. It is his simplified 
proof that we use here. It is not only simpler but also better in the sense 
that with our original proof we would not have gotten a fraction anywhere 
near as big as 1/6 in the statement of the result. Karen Johannson noticed 
that my original Definition [T71 was insufficient. 

3 Rokhlin Towers 

THIS SECTION CONSISTS OF 

Proof of Lemma 11.21 together with an extension of Lemma 11.21 when Com- 
ment M is tightened 

Proofs of Theorem 11.41 and Counterexample Equation ([1]) 
XXX Proof of Lemma 11.21 
Preparation for proof of Lemma II. 2t 
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Sublemma: Let m,n be integers m > and n > m{m — 1). Then n can 
be written as a sum of numbers each of which is either m or m + 1. 



Proof. Write n as km + r where < r < m — 1 and k > {m — 1). Then 
n = {k — r){m) + r(m + 1). □ 

Definition 10. A means symmetric difference. 

Definition 11. The distance between two sets A and B is the measure of 
AAB. 

Comment 14. It is easily seen that distance defined above defines a com- 
plete metric on the class of measurable sets where two sets are regarded to 
be the same if their distance is 0. 

Comment 15. If a sequence of sets forms a Cauchy sequence using the 
above metric, then they converge to a set. However that set is only defined 
up to measure 0, i.e. if a set is the limit of such a sequence and another set 
has distance from that set then the latter set is also a limit of the sequence. 
This is not a problem for us because in ergodic theory, when two sets differ 
by a set of measure we regard them as the same set. 

Before giving a proof of Lemma 11.21 we first give an idea of the proof. The 
reader has to read the idea however because it is actually part of the proof. 

Idea of proof of Lemma 11.21 We will define all the sets Bi we are looking 
for as a limit of sets which form a Cauchy sequence in the above metric. Each 
Bi will be defined in such a way letting i?j ^ be the k^^ approximation of Bi 
(here Bi^k will only be defined for k > i) 
and insisting that the following two statements hold: 

If we fix k and substitute Si := Bi^k for Bi, i < k, 1 

(1) of Lemma O will hold for all Si for i < k, and (2) and (3) I (2) 

of Lemma [1.21 and will hold for Si, for all i < k. J 

In this paper, we will frequently refer to Equation ([2]) explicitly with the 
phrase (referring to Equation ([2])). 



For fixed i, the distances between _Bj ^ and Bi^k+i will be summable. 
(This condition forces Cauchy). 
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Once we define the Bi j and establish Equation (I2D and Equation it 
follows that as k approaches infinity, k converges to a set Bi and Lemma [TT^ 
will hold. 

Proof of Lemma 11.21 

Definition 12. For any w G and any S G Q , J{uj, S) is the least positive 
integer such that T-^'^'^'^^u) G S. 

Comment 16. Theoretically, if T is not ergodic, J{uj, S) could be infinite 
(i.e. if T^{oj) may fail to be in S for any i) but in this proof we will be careful 
to use sets S such that J{uj, S) is infinite only on a set of measure 0. 

Comment 17. Note that (2) and (3) of Lemma 11.21 are equivalent to saying 
that J{u), Bi) is either or + 1 for any u G Bi. 

[Definition of s: It is known that, given that T is aperiodic, one can select 
a set s such that the sets 

T*(s) are disjoint for < i < ni(ni — 1) (4) 

and 

oo 

[jr{s) = n (5) 

which implies J{uj, s) is infinite only on a set of measure 0. There would 
exist such an s no matter what positive integer is used instead of ni{ni — 1). 
Equation ([5]) (no matter what small set s we use) is immediate if T is ergodic 
and to get such an s when T is not ergodic takes a little work but is not hard 
and is known.] 

Construction of Bi i. 

We can partition s into sets Si where u E Si iS u E S and J{u, s) = i. By 
Equation (jl]), T*(s), < i < ni(ni — 1), is disjoint from T°(s) = s (T° is the 
identity map for any transformation T), and thus Sj = when i < ni{ni — 1). 
By the sublemma start with a nonempty Sj and write i as a sum of some 
numbers (say ji numbers) each of which is either rii or ni + 1. 

We then let 
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and (2) and (3) of Lemma 11.21 hold (referring to Equation ([2])) for Note 
that when g = 0, Yl'k=i Ofc = so 5* C Bi i. 

Now fix an / and we assume we have defined Bi j for all z < j < / 
obeying (2) and (3) of Lemma [1.21 and (1) of Lemma [1.21 for i < j (referring 
to Equation ([2])). We wish to define for all j < / + 1 obeying (2) and 

(3) and (1) of Lemma [1.21 for j < I + l(referring to Equation ([2])). 

Construction of 

Construct Bj^ij^i precisely the same way that we constructed Bii re- 
placing ni with throughout the construction (this involves redefining s 
accordingly, i.e. using a different s). 

Construction of for j < / + 1: 

Our strategy will be to first define -B/,7+1 then Bj_i j^i then, Bj_2j+i 
etc. until we define -Bi,/+i. The strategy for getting from -Bi,/+i to 
will be the same for all i except i = / + 1 so we will first show how to get 
from to Bf j+i, then more quickly indicate how to get from to 

7+1 and then even more quickly indicate how to accomplish the general 
case. The increase in the brevity of our explanation at each stage is justified 
by the assumption that as we go from one case to the next we expect the 
reader to see the pattern. 

Fix an a; G We are now about to define values a{uj) and b{uj) 

which will both exist and satisfy a{uj) < b^uj). Recall that since u G 
it follows that J{u, is either rif^i or rij+i + 1. 

Definition 13. Let a{uj) be the least integer greater than {nj){nj — 1) 
such that T°'^^\uj) G Bj j and let b^u) be the greatest integer less than 
J{uj,Bj+ij+i) - {nj){nj - 1) such that T*('^)(a;) G Bjj. 

We wish to emphasize that u G and that for such u 

T'^(")(a;) G Bjj and T'^'^^lo) G Bjj (6) 
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Since, by induction, (3) of Lemma \T72\ holds for _B/ /, (referring to Equa- 
tion 02])) it follows that for any Ui, J(wi, Bj i) < rif + 1. For u G i+i, 
applying that to coi = T^'^"''~^\uj) we have that 

a{uj) < ni{ni - 1) + n/ + 1 = + 1 (7) 

and applying it to ui = T"^+i""^^"'^"^^""^"-^(u;) we have that 

b{co) > 77,/+! — ni{nj — 1) — nj — 1 = nj+i — nj — 1 (8) 

so by (1) of Comment [HI a{uj) < h{uj). Keeping in mind that T° is the identity 
transformation for any transformation T, 

Definition 14. let changeset = |J {T\uj) : < i < a(uj) or b{uj) < 

t^e-Bj+i.j+i 

i < J{uj,Bj+ij+i)}. 

The "0" in the expression < i < a{uj) in the definition of changeset 
establishes that C changeset. Let sameset be the complement of 

changeset. 

For all j < I we will construct in such a way that -B^^j+i Ci sameset 

= Bj j n sameset. This establishes Equation because it means that the 
measure of Bjj^iABjj is bounded above by the measure of changset which 
(by i and ii below) is bounded above by 2[n'j + l]/nj^i so Equation Q follows 
from (2) of Comment [HI 

i) By the definition of changeset, Equation ([7]) and Equation ([8]), 

changeset C [ (j T*(S,+i,,+i] U [ [j T"^+i-'(B,+i,,+i))] 

i=0 i=l 

ii) Since T is measure preserving, for all k, T^{Bi^ij^i) has the same 
measure as which is less than or equal to l/n/+i because -Bj+i^j+i 
satisfies (2) of Lemma [1.21 (referring to Equation ([2])). 

Comment 18. The right way to visualize changeset is to think of it as a 
union of orbit intervals from T^^'^'^\uJi) to T"'^'^^\uj2) where coi E Bj^i j^i 
and 

U2 = T'^^'^'^'^'+^-'+'^\ui) , i.e., the next element of the orbit of ui after ui 
which is in Bj^ij^i. 
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Throughout this section the way we will change Bj j to get will be to 

change it on such intervals so that Bj j "matches up" with Bj j^i on both 
the T^^'^'\uji) and T''^'^^\uj2) . 

Construction of 

We want to alter Bj i on changeset to get so that (2) and (3) of 

Lemma 11.21 continues to be true for (referring to Equation ([2])) and 

furthermore so that becomes a subset of 

By the sublemma, using the same technique we used to construct Bi i 
and Bf+ij^i, for every u G -B/+ij+i, we can produce a finite set of points in 
changeset 

Si{uj) C {T'{co) ■.0<i< a{uj)}, 

listing the terms of Si{u) as 

5*1(0;) = {do,di,d2, ...4} 

and we can produce a finite set of points in changeset 
S^iu) C {T\u) : biu) < % < J(a;,5,+i,,+i,)}, 
listing the terms of 5*2(0;) as 

52(0;) = {eo, 61,62, ...Cn} 

in such a way that 

do = u, either T"irf, = rf^+i or T^i+^rfi = d,+i for i < k; 4 = T"('^)(w), 

60 = T^^'^\u), either T^'iCi = 6^+1 or T'^i^^Ci = Ci+i for i < n, and either 
T"/e„ = T'^('^'^^+i.^+i)(w) or T"/+i6„ = T''(^'^^+i.^+i)(w). 

Let Bij+i = [Bij n sameset] U [ IJ (5*i(w) U 5*2(0;))]. 
Note that for u G 5/+i,7-+i, 
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T<^^ [u) e Bij+, and T^'^'^) (u) e Bij+, (9) 

For u G the fact that we insist that = u in our definition 

of 5*1(0;) estabhshes that Bj^ij^i C It is easy to see (2) and (3) 

of Lemma 11.21 (referring to Equation (|2])) for if you keep in mind 

that T"('^)(w) and T^^''\oj) are both in Bij n Bi^+i, (by Equation ^ and 
Equation ([2])), that fl sameset = Bf j fl sameset, and that (2) and (3) 

of Lemma [1.21 already works for Bj i. 

Comment 19. We just estabhshed by appropriately breaking up big 

intervals in changeset into smaller intervals of size nj and nj+l appropriately. 
Now we are going to appropriately break those subintervals into subintervals 
of size ni_i and n/.i + 1 and those into subintervals of size n/_2 and n/_2 + 1 
etc. Readers may feel at this point that the rest of the proof is clear and 
that they can skip the rest of the proof of this lemma. In that case, skip to 
the next bold face sentence. 

Construction of Bj^ij+i. 

Let 61 = {oj: there exists oji G -87+1,7+1 such that T°-^'^^\uji) = tu}. Let 
O2 = changeset \ 0i so that changeset = BiU02 

Comment 20. If Ui G i?7+i,7+i, (X'2 G 57+1,7+1, 1 < ^1 < nj+i, 1 < ^2 < 
n7+i, and if T^^uji = T^'^uj2, then ki = k2 and coi = U2 because other- 
wise if we just have ki = ^2 we contradict injectivity of T and if ki 7^ /c2 
then T^i57+i,7+i intersects T'^'^Bf^ij^i which means (by injectivity ) that 
T^^^^Bj^ij^i intersects T'^2~^i?7+i,7+i contradicting (2) of Lemma [L2] (re- 
ferring to Equation ([2])). This shows uniqueness of ui in the definition of Gi 
and avoids ambiguity in the proof of the following claim. 

Claim: For all u G -87,7+1 H 62, every T^u) where < i < J{u, Bij+i) 
is also in 02- 

Proof. We have to analyze our definitions of 02, a{u) and b{u). Since u G 
02, 00 = T^{ui) for ui G i?7+i,7+i where < j < a{ui) or b^Ui) < j < 
J(a;i, i?7+i,7+i). Select i with < z < J(a;, ^7,7+1). 

Case 1: < j < a^Ui): By time i,0 < i < J{uj, 57,7+1), T\uj) has not yet 
returned to -87,7+1 so since T^u) = T*+-'(a;i), by Equation ([9]), j + z < a{ui) 
and thus T'{ujj = T^+'{ui) G 02- 
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Case 2: b{uJi) <j< J{ui, Bj^ij^i): By time 2,0 <i < J{uj, Bij^i),T'^{uj) = 
TJ+*(wi) is not yet in Bjj+i D Bj^ij+i so i + j < J(wi, □ 

We will alter Bj_ij to get Bi_ij+i on Gi U 63 where 63 = {T'{u) : 
uj G Bf j^i n ©2 and < i < J{u, Bij^i)}. The claim guarantees that 
this won't effect sameset. Actually O3 = 62 because using a proof similar 
to the proof of the claim, if we start on a point in 62 \ and read 

backwards in its orbit until we reach we will not leave 02- Select 

u G n ©2. Since rij > nj^iini^i — 1) select finite set S{uj) C {T\uj) : 

< z < J{uj,Bi i^i)},S{uj) = {dQ,di,d2, ...dk} such that d^ = u, either 
T"/-id, = di+i or T'^i-^+^di = d,+i for i < k. If T-^i^'^'J+i^uj) G ©i, 
arrange that dk = T'^^^'^'''+'^\ijj). Otherwise arrange that either T^i-^d^ = 
rpj(u;,Bij+,)i^^^ or T'^i-i+^dk = r-^('^'^^'^+i)(w) and then let 
= {Bj^ij n sameset) U |J S{u) 

For induction purposes we note that 

For ui G Bi+ij+i, both T''^'^'\uJi) and T^^'^'\uji) are in (10) 
because 

T"'^'^'^\ui) is the dk where u is the last element 

of the orbit of wi before T''^'^'\uJi) which is in 5/ f+i 

and 

T^('^i)(wi) is the do where u = T''^'^'\ui) which 
works because 

T^^'^^H^i) e Bjj+i by Equation (E]). 

Since both T"(^i(a;i) and T^('^i(a;i) are in Bij n Bi_ij+i C n 
from (Equation (|H]), Equation flTU]) and that Bj j C is part 

of our induction hypotheses), the way fl changeset was just con- 

structed together with the fact that Equation (j2]) works for Bi^ij implies 
that Equation (|2]) works for 

Construction of Bjj^i,j < I — 1: 

As before for all u G -Bj+i,/+in92, all T*(c<;) where < i < J{ijJ, Bj^ij^i) 
are also in B2. Proceed as above. □ 
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This is all you need of this section to follow the rest of the paper. 
You can safetly skip the rest of this section if you wish. 

XXX Extending Lemma [1.21 

Before we extend the lemma we must first extend the sublemma. Fix an 
n. Let N » n. Subtract n + 1 from and then divide by n so that the 
integer value of the quotient is Q = Y{N — n — l)/n\ with a remainder of 
r and you have N = n + 1 + Qn + r = n + l + {Q — r)n + r{n + 1) where 
< r < n. This expresses as a sum of "n"s and "n+l"s where the number 
of "n" s in the expression is Q — r>Q — n and the number of "n + 1" s is 
r + 1 < n. We have established 

Extended sublemma: Fix an n, N, and R > 0. If is so large that 
( [{N - n - l)/n\ - n)/n > R (e.g. if > Rn^ + ra^ + 2n + 1) then A^ can 
be written as a sum of terms each of which is either n or n + 1 such that 

(the number of "n"s) / (the number of "ra + l"s) 

is at least R but such that there is at least one"n + 1". (11) 

It is time for us to pay attention to the proof we just concluded to see 
exactly when we used the sublemma and what happens when we replace it 
with the extended sublemma. The reason we use the sublemma over and 
over in this proof is that we are trying to establish Alpern towers. Adding n 
means that you are on the base of one of these towers and that you have to 
wait time n until you reach the base again. Adding n + 1 means you have to 
wait until time n + 1 before reentering the base. In this way we obtain an 
Alpern tower of height n. When you add n + 1 that means that when you 
leave the base, the last before reentering it you will be in the error set. To 
get nested error sets you want to make sure that 

Every time in the proof that you add a bunch of "n"s and "n + l"s 
to get a large number the last term you add is a "n + 1" . 

That way when considering two successive bases Bi and B2 (where B2 is 
the smaller one) you can be sure that after leaving B2 the last time before 
you reenter B2 you are in the error set of Bi. This gives nested error sets. 
The extended sublemma promises that at least one of the terms being added 
is an "n + 1" so we can make sure the last term added is an "n + 1" . The 
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only time you see error at all is when you add "n + 1" whereas every time 
you see either "n" or "n + 1" you enter the base once. Thus 

(the ratio of the measure of the error to the measure of the base) = + 
l"s) /(total number of"n"s and"n + l"s) which by Equation (fTTj) is bounded 
above by l/(-R + 1). 

But to apply the extended sublemma we first need the preconditions of it 
to hold. Just make the necessary changes to make that possible. First, the 
expression ni{ni — 1) in the definitions of a{uj) and b{u) (Definition [T3!l has 
to be replaced by Rinj + nj + 2ni + 1 where l/{Ri + 1) is the desired upper 
bound for the ratio of the measure of the error set of the "J" th tower to the 
measure of the base of the "J"th tower. To get b{u) > a{u) you will have to 
tighten (1) of Comment [H] accordingly. We summarize all this as follows. 

Comment 21. If the heights of the towers are allowed to increase sufficiently 
rapidly, we can get a sequence of Alpern towers which have nested bases, 
nested error sets and error sets decreasing as rapidly as you might want even 
in comparison to the bases but the ratios of the size of the error sets to the 
size of the bases that you want determines how fast your towers must grow. 

XXX Proof of Theorem 11.41 

Proof. We let T be the shift to the right and assume the existence of B. 
Randomly select co in the space, select a huge N and let 
S = {T{u),T^{u)...T^{u)}, and let 

Ev = "the number of elements of S in the error set is less than Nfi{B)/5" . 

We will derive a contradiction by showing that the probability of Ev ap- 
proaches as N approaches oo. Assume Ev. Let n be the number of elements 
of S in the error set. Then 

1) n< Nfi{B)/5 

Let H be the height of the tower. Since B is the base of the tower it 
follows that 

2) fi{B) < 1/H 

Now list the elements of 5 fl 5 in order as SI = {ui, 002, u}3...uJk}. This 
defines k. Obviously nothing in 5*1 is in the error set of the tower since B is 
the base of the tower. There are k — 1 towers completely in S and perhaps a 
piece of the tower preceding those towers and a piece of the tower succeeding 
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those towers (This is abuse of notation. When we say that "There are k — 1 
towers" we mean "you pass through the tower k — 1 times"). It foUows from 

the fact that each tower has height H that 
3) {k-l){H)+n< N < {k + l)H + n. 

First let us choose the number of terms in S before uii which are not in 
the error set. Since the height of the tower is H the number of such terms is 
less than H so we can choose this number to be either 0, 1, 2, ...or H — 1 

Choice 1: The number of ways we can choose the number of nonerror 
terms before cui in S is H. 

Now we are about to choose A; + 1 numbers oi, 02, ...0^+1. Regarding the 
elements of -SI as being a subsequence of the elements of S, 

ai is the number of elements of S in the error set listed before Ui. 

02 is the number of elements of S in the error set between cji and 002- 

03 is the number of elements of S in the error set between 002 and 003. 



ttk is the number of elements of S in the error set between uJk-i ^-nd Uk- 
Ofe+i is the number of elements of S in the error set after uj^- 

These A; + 1 nonnegative integers have to add to n. If you add one to each 
of them you get k + 1 positive numbers that add to n + A; + 1. There is a 
standard trick which shows that the number of ways to pick such a sequence 
of n + /c + 1 positive integers is ("^'') • 

Choice 2: The number of ways to choose the sequence is {^^^) = ("^'^) 

Once we have made the first two choices, the values of 5*1 as elements of 
S are determined (e.g. if A; > 21 you know for which value of j we have that 
UJ21 = T^iuj)). This selects out k explicit terms in S which are in 5*1 and 
since 5'1 C -B C A we get k explicit terms in S which are in A. For any k 
such terms the probability that they are all in A is 1/2*^. 

We have established that the probability of Ev is bounded above by 



21 



and all that remains is to show that x is tiny. We will make use of approxi- 
mations to make our arguments easier and cleaner but we are sure the reader 
will agree that the inequalities and the speed at which we prove x to go to 
zero overpowers the errors in these approximations. If the reader thinks we 
are cutting it close we could have made it blatantly obvious if we had used 
10 instead of 6 in the statement of Theorem 11.41 allowing us to use 9 instead 
of 5 in the definition of Ev and that would have been good enough to show 
that we have a counterexample. For your convenience we will repeat the 
equations we have established. 

1) n < Nfi{B)/5 

2) fx{B) < l/H 

3) {k-l){H)+n< N < {k + 1)H + n. 

Let M = N/ H. means approximately equal. 
From (1) and (2) 

4) n < M/5 
from 3, 

5) A; ~ {N — n)/H which by (4) is essentially M 

Now we analyze the if, the ("^'^) and the 1/2'^ in the definition of x. 

First term= H 

Second term = {^l') < (^^^+') ^ (m/,) < {Mri'/{M/h)\ ^ (Se)^/^ 
Third term = 1/2'= ^ 1 /2^ 

Multiplying all this together gives /i((5e)/32)^^/5 which goes rapidly to 
as M goes to oo. □ 

XXX Proof of Equation ([T]) 

Proof. Let e be the desired size of your error set. Then 

e<l/(n + l) (12) 

Consider the following two words in "0"s and "l"s: 
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a "0" followed by n — 1 "l"s which we will refer to block and 

a "0" followed by n "l"s which we will refer to as a n + 1 block. 

We will independently concatenate n blocks and n + 1 blocks to get an 
infinite word where each time we select an n + 1 block with probability 
en/{l — e) which is less than 1 by Equation fll2p . This gives a generic word 
for an n + 1 step mixing Markov process and such processes are know to be 
Bernoulli. We use that process, let T be the shift of a doubly infinite word 
on that process and BB be the event that there is a "0" at the origin. It is 
easily seen that BB is the base of an Alpern tower of height n and error set 
of size e. Now select > n{n — 1) + \ n/2\. Divide N — \ n/2\ hj n to get 
an integer part Q with a remainder of r and that enables us to write as 
N = \n/2\ + Qn + r where Q > n — 1 and r < n — 1 

Rewrite that as 

N = [n/2\ +r{n + l) + (Q -r)n (13) 

and note that Q — r > 0. For any point (i.e. doubly infinite word) u we 
say that u is in the beginning of its BB block if it is in BB, i.e. if it has 
a "0" at the origin. Points near the middle of a BB block obviously have a 
"1" at the origin. Although n + 1 blocks may be rare it is nonetheless true 
that any finite sequence of blocks occur with positive probability and hence 
will eventually occur. In particular the following will eventually occur in the 
output of a randomly chosen u . You will see 

r blocks which are n + 1 blocks which we will call D blocks 

then 

Q — r blocks which are n blocks which we will call E blocks 
then 

r blocks which are n + 1 blocks which we will call F blocks 
then 

Q — r blocks which are n blocks which we will call G blocks 

Suppose for a contradiction that there is an Alpert tower of size A^ whose 
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base CC is a subset of BB. Let Ui be the translate of oo whose origin is the 1 
at the end of the last E block (or the last D block if Q — R = 0). Look where 
cui is in it's the N tower and read backwards until you get to the bottom of 
the N tower to get a point x which is in CC. Then read forward until you 
get to the next time that you are in CC at a point y. The distance you have 
to travel in the orbit to get from x to y is either or + 1. Since x and y 
are in C C BB, the both have a at the origin. By Equation f lT^ x is in the 
beginning of an D or E block (It can't be partway below the first D block 
or it would have a 1 at the origin). Again by Equation ( !T3|) you have to go 
through exactly r blocks which are n + 1 blocks to get from x to y and every 
other block you go through is an n block. But that is impossible because 
(again by Equation (IT^ ) that would put y near the middle of its BB block 
causing it to have a 1 at the origin. □ 



4 Proof of Theorem 1.1 



We assume T to be a non-periodic transformation. 

We will first prove Theorem 11.11 in such a way that the future says very 
little about the past and then do the other extreme; we will modify the proof 
so that the inverse of the process also obeys the theorem. 

Lemma 4.1. For any nonperiodic measure preserving transformation T there 
is a function f from Q to N and a function g from N to N such that 

1) \fiT{u))-fiu)\<lfor alluj. 

2) For all u and all nonnegative integers i, there is a member of 
{T{ijo),T'^{ijo)...T^'^'^\ijo)} where f takes on a value greater than i. 

Definition 15. After proving this lemma / and g will henceforth be the 
functions above given by this lemma except in sections 4 and 6. Sections 4 
and 6 are the sections where we discuss uncountable partitions, Theorem 11.11 
is not relevant, and / will have a different meaning. 

Proof. For the purposes of this lemma we don't need nested Alpern towers. 
We only need a sequence of Alpern towers where the height of the i^^ tower 
is Hi such that 

Hi > 3i for all i. (14) 

oo 

^ii/n^) < oo. (15) 
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Fix i and lo. Define j to be the number such that uj is in the j rung of the 
i^^ tower (let it be oo on the error set). Define a function /j by 

{i ifl<j<i 
2i-j ifi + l<j<2i (16) 
otherwise 

and it is clear that fi obeys (1) of Lemma HTTl 

Each rung of the i^^ tower has size at most 1/nj so the support of /, has 
measure at most 2i/ni. By Equation f llSp . Borel Cantelli says that (after 
removing a set of measure 0) every u is in only finitely many such supports. 
That means the following definition makes sense. Let / be the maximum of 
all fi] i going from 1 to cxd. It is easy to verify that the maximum of a bunch 
of functions obeying (1) of Lemma [4.11 obeys (1) of Lemma [4.11 so / obeys 
(1) of Lemma [4.11 Let g{i) = n^+i + 1 . Since the i + 1**^ tower is an Alpern 
tower of size rij+i any stretch of orbit of size g{i) eventually hits the i + 1*^ 
rung of the i + 1 tower and hence in such a stretch there is a point u where 
fi^i{u) = i + 1 and hence /(w) > i + 1. □ 



Completion of proof of Theorem 11.11 

Now let Pi be an increasing sequence of finite partitions which separates 
points and label each piece of each with a distinct positive integer i.e. for 
any a piece of and a piece of Pj are labeled with the same integer only 
when i = j and they are the same piece. Let G be an injection from finite 
sequences of integers to positive integers such that 

G{ni,n2,n3...nk) > rii for all finite sequences ni,n2...nk- (17) 

We now proceed to define the positive integer valued function h discussed 
in the statement of Theorem 11.11 For any u where /(w) = 0, define h{u) 
to be g{l). Now fix i > and suppose h is defined for every u for which 
f{co) < i. We now suppose f{u) = i and describe how we define h{uj). Select 
the least positive m such that /(T™(a;)) > i (the existence of such an m is 
guaranteed by (2) of Lemma [TTj) . Then h(T{u)), h(T^{u)), ...h{T"^~^{u)) are 
already defined by induction. Now let ao, ai, 02, ...am be the integers labeling 
the pieces of Pj containing u,T{ijj), ...T"^{ijj) respectively. We define h^u) to 
be 
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h{uj) := G{g{i + 1), ao, ai, 02, ...a^, -1, h{T{uj), h{T\u), ...h{T^-\u)) 

Note that since —1 is the only negative term in the definition of h, it 
serves as a comma separating the terms Oj from the terms h{ui). In other 
words, since G is an injection 

h{uj) tells you all the ai,l < i < m — 1, and all the h{T^{uj)), 1 < i < m — 1, 
and the —1 lets you know which is which. (Recall that i := f{uj) and m is 
the least positive integer such that f(T"^{u) > i) (18) 

Actually in this case the -1 is pointless since you already know m to be 
half the number of terms (including the -1) but we are including it so that 
we can generalize later. 

Now the map 

u — !■ ..M{T~'^{u), h{T^^{u), h{uj), h(T{u)), h{T'^{u), ... is a homomorphism 
from T to a stationary process which is an isomorphism if it separates points 
(in which case the countable partition defined by "cui is in the same piece 
of the partition as U2 iff h^ui) = h{co2)" is called a generator of T). Thus 
the proof of Theorem 11.11 will be complete when we show Equation (fT9|l and 
Equation ( l20l) below: 

The above stationary process separates points, i.e. for any two distinct 
points a and (3 in Q, there is a j such that h(T^{a)) 7^ h{T\l3)). (19) 

If h{T-\uj)) = r, then h{T-'{io)), h{T^-' (u)) , ...h{T-\uj)) determines h{u:). 

(20) 

Proof of Equation (fT9|) : Let uji and UJ2 be two distinct elements of Vt. 
Then there exist a j such that Pj separates them and by (2) of Lemma 14. H 
there is a positive k such that f{T~^{uJi)) > j. By selecting the smallest such 
k and letting f{T~^{(jJi)) =: ii > j, hj Equation f|T8l) . h{T~^{uji)) encodes 
which piece of Pa and hence which piece of Pj (because the partitions are 
increasing) uji is in and thus is different from h{T'^{u2)). 

Proof of Equation fl20l) : Let Ui = T~^{uj) so that h{uji) = r. Let i : = 
f{<jJi). By Equation f[T7|) . r = h{(jji) > g{i+\) so by (2) of Lemma Wl\ there 
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is an element of {(T ^{uj),T^ ■■■T ^(w))} where / takes on a value of 

more than i + Let u:2 '■= T^^{uj) where k is the minimum positive number 
with f{T-^{uj)) > « + 1 so if we let f{T-^{uj)) =: io, then io > ^ + 1- Then 
—k > —r. We will be done if we can show that 

h{ijj2) determines h{u). (21) 

Now let s be the least nonnegative number such that /(T*)(w) > io which 
is greater than i + 1. Since i = f{T~^{uj)), by (1) of Lemma s > so 
Equation fl2T]) follows by Equation f lTSj) applied to 002 where we replace i with 
io. □ 

Comment 22. We actually proved more than we said we would. All we said 
we would prove is that h(T~^'ijj), h(T^~^'ijj), ...h(T~^u) determines h{u). We 
actually proved that one of those values alone encodes enough information 
to determine h{uj). 

Comment 23. By proving the above theorem we have among other things 
reproved the countable generator theorem. We just defined h as 
h{uj) := G{g{i + 1), Oq, Oi, 02, ...flm, —1, h{T{uj), h{T'^{ijj), ...h{T"^~^{u)) where 
ao, ai, 02, ...dm are the integers labeling the pieces of Pi containing 
u,T{lo), ...T"^{u) respectively. 

However, suppose we already have a finite or countable generator P before 
starting this proof and labled the pieces of P as integers. We can simply write 
h{u) := G{g{i + 1), a, h{T{u), h{T\u), .../i(T™-i(a;)) where a is the element 
of P containing u if we also insist that f{uj) = ^ h{uj) = G{g{l), a)) 
because this defines h to be finer than P which we already know to generate, 
so h generates the a-algebra and hence we don't need the ai, 02, ...a„_i. 

Comment 24. Now suppose h has been defined as in Comment [231 The 
purpose of the following examples and analysis are to show that although our 
theorem says that the past determines the future in a very strong finitistic 
way, when you read the other way and look at how the future effects the past, 
the conditional measure on the present given the future has almost all the 
full entropy of the original process. The intuitive idea of this is that if you 
look at the future of the h process, the only thing it encodes is the future of 
the P process together with a knowledge of the future / process and future 
g process. The future / and g processes are determined by where you will be 
in the Rokhlin towers. Thus the only information you have about the past 
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of the P process given the future of the h process, that you would not know 
just by looking at the future of the P process, is where you will be in the 
Rokhlin towers, and if these towers are built with tiny and rapidly decreasing 
bases that is not very much information. Until we say otherwise, h is 
defined as in Comment 1231 not as in the proof of the theorem, but 
of course the theorem still holds. 

Example 1: Start off with the standard 2 shift with canonical 1/2, 1/2 
generator whose pieces will be denoted by H, T, to suggest that we are looking 
at infinitely many independent flips of a fair coin [H meaning heads and T 
meaning tails.) Now cross that process with another aperiodic process with 
small entropy (you can even assume that the process we are crossing with 
has entropy zero.). We consider the proof of Theorem 11.11 for this product 
process. Let (a, b) be a generator of the small entropy process so that the 
entire process has generator ((a, H), (6, H), (a, T), (6, T)) which we can write 
as (1,2,3,4). Now arrange that all our Rokhlin towers are measurable with 
respect to the small entropy factor so that / and g are measurable with 
respect to the small entropy factor. Then define h as in Comment [23] except 
make sure that h takes on an odd number when w is in 1 or 2 and an even 
number when a; is in 3 or 4 (which we can do by controlling our definition of 
G). Now when we look at the future h process (i.e. h{l), h{2)...), the only 
thing it encodes is the future {H, T) process together with the values of / 
and g in the future. But the / and g processes are independent of the (if, T) 
process, and since the future if, T values are independent of the present H or 
T it follows that even conditioned on the future h process, the present value 
of h is odd with probability 1/2. This means that as we look backwards in 
time all the randomness of the if, T process is still there. 

Example 2: (generalization of previous example): Now start with an 
arbitrary P process (we will assume a two set generator (1,2) for simplicity 
but the reader probably has enough imagination to figure out how we would 
handle an arbitrary generator.) Do the same as we did in the previous 
example, crossing with a aperiodic transformation of arbitrarily small entropy 
(perhaps 0) and arranging for / and g to be measurable with respect to that 
transformation and h to be odd iff we are on the piece labeled "1" of P. 
Then if we just look at the oddness and evenness of h we see the original P 
process and the probability of an odd number in the present given the future 
h process is the same as the probability of a "1" given the future P process so 
that all the randomness of the P process is maintained as we look backwards 
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in time. 

Example 3: (General case except we use two set generator for simplicity) 
Do the same as in the previous example again assuming generator (1,2) but 
this time don't bother crossing it with anything. By choosing your Rokhlin 
tower to have small and rapidly decreasing bases, we can arrange that a 
typical / and g name is exponentially big (i.e. that the / and g processes 
has small entropy.) Again let the evenness or oddness of the h process read 
off the P process. Given the future h name, all we know is the future / and 
g names and future P name. The theory of relative entropy (In particular 
Pinsker's formula) implies that the relative entropy of the P process over the 
/ and g processes is only slightly less than the entropy of the P process. If, 
for example, the P process has entropy 1/3, then even when conditioned on 
the entire future h process, the relative entropy of the P process over the / 
process is only slightly less than 1/3, i.e. the expected entropy of the the two 
set partition 

[the value of h at time is even, the value of h at time is odd] 
given the future of h is only slightly less than 1/3. 

Comment 25. We now do the opposite extreme. We arrange for Theo- 
rem 11.11 to hold in both directions. The above examples show that we can 
arrange for the past to determine the future in this very deterministic way 
while the past given the future can be made to have almost the full entropy 
of the process. We now point out that if we had wanted to, we could instead 
have had the theorem work in both directions, i.e. not only would an n at 
time -1 have meant that h at times — n, ... — 2, —1 have determined the h at 
time 0, but also that an n at time 1 would imply that h at times 1,2, ...n 
would determine h at time 0. 

Accomplishing Comment 125} 

Simply replace the sentences 

"Select the least positive m such that f(T"^{Lo)) >i + l (the existence of such 
an m is guaranteed by (2) of Lemma 14. II) . Then h{T{u)), h{T'^{u)), ...h{T^^^{uj)) 
are already defined by induction." 

in the proof of the Theorem 11.11 with the sentences 

"Select the least positive m such that f{T'^{uj)) > i + 1 and the least positive 
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n such that /(T "(w)) > i + l(the existence of such an m and n are guaran- 
teed by LemmaE]). Then h{T^-'^{uj))...h{T-^{uj)), h{T{tu)), h{T\u)), hiT-^- 
are aheady defined by induction." 

and then define h{ijj) to be 

G{{g{i + 1) + 1, ao, ai, 02, ...am, -1, 

/i(Ti-"H).../i(r-i(u;)), /.(TH), /i(r2(a;)), ...h{T^-\uj))) 

and then you can carry out the proof in both directions. There is no reason 
to add more a^s because their only purpose is to assure that /i is a generator. 

5 Uncountable Partitions 

Comment 26. We now consider processes on an uncountable alphabet. The 
following is a known trick (this is [13] already mentioned in the introduc- 
tion). Suppose we have a process with a generator P (finitely or countably 
infinite) and we want to get an uncountable generator which behaves per- 
versely. Just let your partition be and the piece of the partition that 
a; is in is (po,Pi,P2, •••) where pi is the piece of P containing T^{uj). Then the 
term at time -1 completely determines the term at time but when reading 
from future to past you get a Markov chain with all the randomness of the 
original process. However we have found a particularly perverse example of 
this phenomenon. 

To motivate this example consider a 2 dimensional process in which each 
lattice point of the plane is endowed with a random variable which takes on 1 
with probability 2/3 and with probability 1/3 and suppose these variables 
are all independent. This completely defines a process. Here is another way 
to define the same process. Just say that 

running the process backwards, the columns form a stationary process 

which is in fact a Markov chain in which the conditional probability 

of the column given the 1 column is always the 1/3, 2/3 i.i.d. measure. 

(22) 

Hence it is impossible to obtain a process which is defined by Equa- 
tion fl22|) and still have the -1 column determine the column. The following 
example shows that you can almost do that. 
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Define a (generally non-stationary) sequence of random variables 
X_2, X_i, Xq, Xi, X2, ... 
to be blue if they are independent of each other and each one takes on one of 
the following two distributions; with probability 1/3 and 1 with probability 
2/3, or 1 with probability 1/3 and with probability 2/3. Since a blue process 
is not in general stationary, the entropy of a blue process is not defined but 
it is intuitively obvious that a blue process from an entropy standpoint is the 
same as the 1/3, 2/3 i.i.d. process. Hence, given the previous paragraph 
it is rather surprising that we can define a process on the 2 dimensional 
lattice points such that the columns form a stationary process, the -1 column 
determines the column with probability 1, and yet the column process is a 
backwards Markov chain where the conditional probability of the column 
given the 1 column is always blue. Although it is surprising that we can do 
that it is also easy so we suggest that the reader try to do it himself before 
reading onward. 

We define the process by defining the backwards Markov Chain. Let / 
be any infinite to 1 map from the integers onto the integers. We assume we 
know the 1 column and define the measure on the column (independent of 
the 2,3,... columns). We simply define it to be 

the blue process which on point (0, i) takes on the 

same value as (1, /(«)) with probability 2/3 

and the opposite value with probability 1/3. (23) 

The -1 column determines the column because if you know the -1 column 
and you want to determine the value at (0,i) just look at the infinite set 
of values on terms (— l,j) where f{j) = i and by the strong law of large 
numbers 2/3 of them will be 1 iff the term at (0, i) is 1. To rigorously define 
this process we have to define a stationary measure on the columns but 
the theory for doing that is analogous to the theory for finite state Markov 
processes and we leave that to appendix. Also analogous is the proof that 
this is a mixing Markov process and that such processes are Bernoulli (all 
proved in the appendix). 

6 Uniform Martingales 

This section modifies the proof of Theorem 11.11 to prove Theorem 11.71 using 
Definition O Definition [H] is trivially satisfied for the process guaranteed by 
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Theorem 11.11 because for any integer a there is a fixed time past which you 
will know precisely whether or not the integer at time is a given that much 
past (actually the theorem does not say that but if you look at the proof you 
will see that not only is it the case that if there is a 5 at time -1 then the 
process at times -5,-4,-3,-2,-1 determines the value at time 0, but furthermore 
if there is a 5 at time then the process at time -5,-4,-3,-2,-1 determines that 
there is a 5 at time 0.) However Definition [H] fails for this process because 
given the entire past, the measure at time puts all its mass at one number 
and if that number is huge it will take a long time before you can even suspect 
what it is. It is Definition [9] which is of interest here. 

It was shown that a specific finite state process called the T, T^^ process 
could be extended to a finite state uniform martingale in and then it 
was shown that every zero entropy process could be extended to a finite 
state uniform martingale in |5j . In both cases essentially the same technique 
was used. We will use that technique again in this paper to show that 
by modifying Theorem 11.11 we can establish that every aperiodic stationary 
process can be extended to a uniform martingale on a countable state space. 

The basic method used in those two papers and in this one for con- 
structing a uniform martingale is as follows. First we construct two jointly 
distributed processes, one an i.i.d. process a_2, ct-i, ao, ai, 02, ... of positive 
integers called the lookback process. The other an arbitrary process, 

6_2, &0) ^1) ^2, ••• on a finite or countably infinite alphabet and we ar- 
range that these processes are jointly distributed so that 



The Oj process is an i.i.d. process (24) 

Each ai is independent of all the following random variables jointly: (25) 
all aj,j < i and all hj^j < i. 

Each bi is determined by ai, 6j_2, ■■■h-a,- (26) 

Comment 27. It is obvious that Equation fl2S]) immediately implies Equa- 
tion fl21|) but we will often say, "Suppose we have Equation fl21|) . Equa- 
tion (125|) and Equation ( |26|) " because it is useful to get the reader to keep 
Equation (!24|) in mind. 

Lemma 6.1. If we can establish ...,a_2,a_i,aQ,ai,a2, ■■■ 

and ...,b-2,b-i,bo,bi,b2, ■■■ processes obeying Equation , Equation f l25|) . 

and Equation fl26|) . then 6_2, fe-i, 60? ^i? ^2, ■•• is a uniform martingale. 
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Proof. Let fi be the measure on bo given the n past (the n past := 6_2---&-n) 
and let u be the measure on bo given the entire past (the entire past : = 
b-i,b-2---) ■ To estabhsh the lemma we must couple fi and u so that they 
usually agree where exactly how "usually" depends only on n (Coupling two 
measures means establishing a measure on the product space of the two 
spaces with the two measures as marginals. Joining is a special case of cou- 
pling). We are now going to couple /i and u. 



1) Fix b_i,b^2---b-n in accordance with the measure on the n past 

2) Select 6-n-i, &-n-2, ••• in accordance with the measure on the 
entire past from —n — 1 onwards given that the n past is 

3) Use the same measure as was used in (2) to select 

and chose them to be independent of 6-n-i, &-n-2, ••• conditioned on 

b^i,b^2--b-n 

4) Now we have two complete b pasts 



b-l, b^2---b~n, B-n-1, 



Select a_.i,a_2... so that its joint distribution with 

b^2---b-n, b-n-l, b-n-2, ••• 

obeys Equation (121]) . Equation (12^ and Equation and then select 
A„i, A_2, ■■■ so that its joint distribution with 

b-l, 6-2. ..&-n, -B-n-l, -B-n-l--- 

is the same as the joint distribution of a_i, a_2... 
and 6_2. ..&-„, 6-n-i, &-n-2, ••• 

5) Now we select Ao and oq. First choose ao independent of everything 
chosen so far. Note that all we need to know about Ao is that it is 
independent of all other A_i, of 6_2..., b-n, and of all B_i 
so we can choose Ao to equal ao- 

Now focus on the second process (the one which uses "i?" s) and consider 
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the measure /i on Bq conditioned on only 6-1, 6-2) ■■■b-n (in other words in- 
tegrate out over all possible continuations B_n-i, B_n-2---) and then on the 
first process (the all "6" process) and consider the measure v on 60 condi- 
tioned on the entire past. \i ao <n the two values of 60 are identical (by (5) 
and Equation fl26l) ). We have established a coupling of /x and v which agree 
unless aQ> n and thus the lemma is proved. □ 

Corollary 6.2. [of the proof of Lemma \6. ij/ // we can establish 

a_2, a_i, ao, ai, 02, ... and 6_2, 6-1, &o; ^i; ^2, ••• processes obeying Equa- 
tion ^IM .Equation fl25l) and Equation fl26l) . then 

...(a_2, 6-2), (fl-i, (cto, ^0)5 (c^i, &i), (02, &2)--- a uniform martingale. 

Comment 28. The reason this corollary was never mentioned in [1] or [5] 
was that we did not want to keep the Oj because that would cause our al- 
phabet to be countably infinite and we wanted it to be finite but in this 
paper it will turn out that the 6_2, &-i, &05 ^i, ^2, ••• process already has 
an infinite alphabet so it does not make any difference whether we keep the 
a_2, ct-i, do, 0,1, a2, ... or drop them. Our only reason for pointing out that 
you can drop them in this paper is that it may help the reader in future 
research for solving some of the currently open problems that we mention in 
this paper. 

Preparation for the proof of Theorem \l.l\ a: 

Here is how we will proceed. This is exactly the same procedure that was 
used in and [5]. We will start with our arbitrary non-periodic transfor- 
mation. 

Step 1: We need to alter Theorem 11.11 We will then endow our non- 
periodic process with a countable generator using the altered form of The- 
orem [TTTl This gives us a stationary process c_2, c_i, cq, ci, C2, ... on a 
countable alphabet which is isomorphic to the given transformation. This 
step uses Alpern towers with nested bases. 

Step 2: We will then cross that process with a lookback process 

a_2, Ct-l, flO) C^l) C^2; ■•• 

Step 3: We will then change some of the values Cj to a new letter called 
"question mark" notated by the symbol "?" . This is done in infinitely many 
stages. After making these changes c_2, c_i, Cq, Ci, C2, ... will now become 
another process 6_2, &0) bi, 62, ••• , i-e. each 6j is either Cj or "?". 



34 



Step 4: We will establish Equation .Equation f l^ . and Equation (I2HD 
for a_2, a_i, oq, fli, ••• and 6_27 ^O; ^i; ^2, ••• thereby establishing 
that 6_2, bo, bi, 62, •••is a uniform martingale by Lemma [6.11 

Step 5: We will show that 6_2, b^i, bo, 61, 62, ••• is an extension of 
c_2, c_i, Co, Ci, C2, •••, i^e. that from a given b name we can recover the c 
name, i.e. that by looking at the entire b name we can determine what the 
question marks were before they were converted to question marks. We will 
then be able to conclude that c_2, c_i, cq, ci, C2, ... can be extended to a 
uniform martingale on a countable alphabet. 

In this paper, as is the case in every case where this procedure has been 
used, in order to establish step 5 it is necessary that the past not only de- 
termine the present but that furthermore a random sampling of the past is 
sufficient to determine the present. This means that it is necessary when 
establishing step 1 for us to get the past to determine the present with 
substantial redundancy (i.e. we need to arrange that many pieces of past 
determine the present). Theorem 1 1 . 1 1 does not provide sufficient redundancy 
so we will have to modify the proof of it to get that redundancy. 

Proof of Theorem 11.71 a: 

Carrying out step 1: For those who read P] or [5] this is really the only 
step you should have to read. Steps 2 through 5 are just repeating the 
procedure in those papers. 

The whole goal of Theorem 11.11 is to arrange that if h{T^^{u))) = k 
then if you look back k steps you will be able to determine h{u). Here 
we are no longer interested in looking at T~^{u) but rather on u itself so 
(1) of Lemma 14.11 is no longer of interest to us. On the other hand (2) of 
Lemma [4. II says that for all u and all nonnegative integers i, there is a mem- 
ber of {T(uj),T'^{uj)...T^^^\uj)} where / takes on a value greater than i but 
here we would like to replace that with 

for all u and all nonnegative integers i, there is 
a member of {T{u),T^{u)...T^'''\u)} 

where / takes on i + 1. (27) 
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Since we are dropping (1) of Lemma r4.1l we can simplify the definition of 
/. Just define it to be the largest i such that u & Bi where Bi is the base of 
the "-^''th tower letting it be if there is no such i (Again there is a largest 
one by Borel Cantelli). Now instead of defining g{uj) to be rij+i + 1, define it 
to be 2r2i+i + 2. 

rii^i + 1 would be sufficient if all we wanted was a k where f{T^{uj)) > i 
because in any rij+i + 1 consecutive terms there must be a term where you are 
in -Bj+i of Lemma [112] but unfortunately you might also be in Bj for j > i + 1 
which would imply that you are in Bi+2 because in this proof we assume the 
Bj are nested. But if that happens then in the following n^+i + 1 terms you 
cannot conceivably be in because the height of the i + 2 Alpern tower is 
too big. Hence somewhere in those 2nj+i + 2 terms you must be in Sj+i \i?j+2 
and for that term / takes on exactly i + Equation fl27j) is established. This 
is the only use of nested bases in this paper. 

In the proof of Theorem II. II we set h{uj) equal to 

GMi + 1) + 1, ao, ai, a2, ...a^, -1, h{T (u) , h{T\tu)) , ...h{T^-\uj)))) 

where m was chosen to be the least number such that f{T"^{u)) > i and i 
is chosen so that f{uj) = i. We now alter that definition. In this section we 
can just let h{u) be any fixed positive integer if f{u) = (if you want us to 
be precise let h^u) = 1 if /(w) = 0). Now we let i := /(w), assume i > 
and assume h{u') has been defined for all u' with f{uj') < i. We start with 
a rapidly increasing function, say ill. 

Definition 16. Ai = i\\g{i + 1). 

Comment 29. By Equation fl27p for any u and any consecutive integers, 
k,k + l...k + Ai — l there are at least ill integers j such that f{T^{uj)) =i + 2. 

We now redefine h. 

Definition 17. Recall that f{u) = i. Let uJi,U2, ...ujr be the subsequence 
of all elements to' of {T{uj), h{T'^{uj)), ...h{T^^{uj)} such that f{uj') < i so 
that h has already been defined. For 1 < z < r let Q{i,uj) be the value 
of j such that T^^u) = Ui. Let Pk be an increasing sequence of partitions 
which separates points where the pieces are integers as before. Let aj be the 
element of Pi containing T^{u) and define 

h{uj) := G{ao, ai, Oa, ...a^,, -1, Q{1, w), h{ui), Q{2, u), /^(wa), ■■■Q{r, w), h{ur)) 
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h generates for the same reason as before. Recall that in our original 
definition of h, if /(T~^((X')) = k then in the last k terms there is a term which 
tells you every h{T\u)) at least until i reaches a j where f{T\u)) > k + 1 
and in particular tells you the value of h{(jj). Now we do not focus on T^^(u;) 
but rather on u itself. Furthermore we are no longer interested in what 
you learn when you look back h{uj) but rather we are looking at what you 
learn when you look forward Ai where i := /(w). Immediately from our new 
definition of h, 

if f{ijj) = i then for any u' among the terms 

T{uj),T\uj)...T^^{io) for which /(w) < i, 

h{uj) determines h{u'). (28) 

Comment [22] and Equation provides the redundancy needed to carry 
out our construction. 

Notation 1. When we say u knows it's identity we mean that h{uj) is de- 
termined. For a < b, T"-{uj) tells T^{uj) its identity means that h{T°'{u})) 
encodes h(T^{(jj)) or we may just say that a tells h its identity when T and 
uj are understood. 



Carrying out step 2: Let q = h{T^uj) which makes q a random variable. 
Since as in Theorem 11.11 h is a generator for the transformation, the shift on 
the Ci process is isomorphic to T. We now create an extension of T by at- 
taching c_2, c_i, Co, Ci, C2, ... to a lookback process a_2, a_i, Oq, Oi, 02, ... 
which we choose to be an i.i.d. process which is completely independent of 
c_2, c_i, Co, ci, C2, All that remains is to describe the distribution of qq. 

Distribution of ao: We let oq take on the value A{i) 
with probability 1/2* and then the distribution of the 

a_2, a^i, ao, cti, 02, ... process is completely defined by the 
fact that it is i.i.d. (29) 

Carrying out step 3: Take the independent product of the 
a_2, ct-i, flO) c^i! '^25 •••• process and the .., c_2, c_i, Co, Ci, C2, ... process and 
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write it as ...(a_2, c.a), (a_i, c_i), (oq, Cq), (oi, Ci), (aa, C2), ... 
and call that the process. 

We now define a sequence of processes starting with the process by 
successively turning more and more of the Cj to question marks while never 
altering the process. To obtain the 1 process, let clj = "?" if none of 
Cj_i, Cj_2....Cj_a-. determines the identity of Cj. Otherwise, let clj = Cj. 

After changing these unidentified q to question marks, c_2, c_i, Cq, Ci, C2, ... 
turns to cl_2, cl_i, clo, cli, CI2, i.e. each clj is either q or "?". The 1 
process is 

...(a_2, cl_2), (a_i, cl_i), (ao, cIq), (cii, cli), (02, CI2), ... 

To obtain the 2 process take any i where clj has not yet turned to a "?" 
and turn it into "?" none of clj_i, clj_2..., clj-ai determines its identity to 
get c2_2, c2_i, c2o, c2i, c22, .... so that the 2 process is 
...(a_2, c2_2), (a_i, c2_i), (ao, c2o), (ai, c2i), (02, c22), ... 

Clearly if clj is "?" so is c2j because you have less information at the 
second stage than you do at the first (a "?" does not determine the identity 
of anything) but new question marks may occur at the second stage that 
were not question marks at the first stage because a term Ck which allowed 
a given Cj to determine its identity may have changed to a question mark 
at the first stage and hence was not available at the second stage to help Cj 
determine its identity. 

Continue this procedure to obtain the 3 process, 4 process etc. turning 
more and more terms into question marks. At every stage we know all of the 
Oj. Define 6_2, 6-1, 60, ^i, ^2, ... by 6, = "?" if cui = "?" for any n and 
bi = Ci otherwise. This limiting process, together with the process is to 
be called the final process. It is not immediately obvious that it is not the 
case that all bi are "?" but we will ultimately see that many of the bi are q. 
This is exactly the procedure used in |1] and [5] and just as in those cases 
the important issue was getting the past to redundantly tell you information 
about the present. The final process is the product of 

Qi—2) Oi—i) c^O) ••• and 

b-2, b-i, bo, bi, 62, ... which can also be written as 
...(a_2, 6-2), (a-i, b-i), (ao, &o), (^i, &i), (^2, ^2), ... 

Carrying out step 4'- We now wish to establish Equation flM|) . Equation ( 125|) 
and Equation fl26|) for the final process. Equation flM|) is given. To see 
Equation (^^, i.e. that ao is not only independent of all previous aj but also 
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of all previous 6j, just note that is independent of all the previous Oj and 
Cj in the process and that all previous bi are determined by the previous at 
and the previous Cj. We now need only establish Equation f l26|) which we do 
by showing that a given bk is a question mark iff it is impossible to determine 
its identity by looking back in the {bi} process (that means looking at the 
k — l,k — 2, ...k — ttfc terms) and that otherwise it is still c^. Your final bk 
will be a "?" iff it became such at a finite stage iff in one of the intermediate 
processes everyone amongst those terms who determines its identity died 
(i.e. was a question mark). If not then at the end there is still someone left 
to tell him his identity. 

Carrying out step 5: All that remains is to show that the b process deter- 
mines the c process (i.e. that the b process extends the c process, i.e. that by 
looking at the entire b process you can determine what the question marks 
were before they turned to question marks.) 

We select a fixed w, say ujq and then look at the Oj, bi and q processes 
for ojq. It suffices to show that the b process determine Cq because if you 
can prove that you can determine Cq than you can determine all Cj by just 
translating the argument by j. 

Select e > and after selecting e select a fixed huge number i and in 
particular make sure that i > /(wq)- Break the negative numbers —j into 
the following classes. 

Class 1: j < Ai, 

Class k, k> 1: A+fe-2 < j < A+k-i- 

By Comment |29] and the fact that Ak+i > 2Ak for all k, it follows that 
for all k>2 there are at least {i + k — 2)\\ integers —j in class k such that 
fiT^^iujo)) = i + k. Since Ai > Ai_i, in class 1 there are at least (i — 1)!! 
integers —j such that f{T~^{uQ)) = i + For all k including k=l, refer to all 
the —j in class k such that f{T~^{uQ)) = i + k as the special class k terms. 

Define event k for all > 1 to be that there is a special class k term j such 
that a_j = Ai^k (recall that for any particular j this has probability 1/2*+*^). 
Then since, event 1, event 2, . . . have probabilities rapidly approaching 1 
as k approaches infinity (because there are so many special class k terms), 
the probability they all happen rapidly approaches 1 as i approaches infinity. 
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Since i was chosen after e we can assume 

they all happen with probability > 1 — e (30) 
Suppose they all happen. Let jk be a specific class k term for which 

and because it is special 

(31) 

Since 

jk+i is in class k+1, < jk < jk+i < A+k-i < A+k+i and /(T-^'^+i(u;o)) = 
i + A; + 1 for every k, Applying Equation ( 125]) . replacing T~-"=+i(cJo) for <^ , 
i + A; + 1 for i, and Ai+k+i for Ai, 

we get that 

h{T'~^'=+^{ujQ)) determines h{T~^''{ujQ)). Now fix k and suppose we are going 
from the to the 1 process and we want to know if Cl^j^, is a question mark. 
fiT'^'^ujo) = Ai^k and since j^+i < Aj+fc-i when —j^ looks back Ai^^ it sees 
—jk+i so —jk does not turn to a question mark for any k. Using this reasoning 
over and over, c_j^ never turns to a question mark so b-jf, = C-j^ for all k. 
By Equation ( 128|) — ji gives the identity of 0. Therefore by Equation ( 130|) 
Co is determined with probability more than 1-e and since e is arbitraty Cq is 
determined with probability 1. □ 
Proof of Theorem 11.71 b: We just finished carrying out step 5 where we 
showed that from c_2, c_i, cq, ci, C2, ... we can recover 6_2, ^o, ^i, ••■ 
and since a_2, a-i, ao, cii, 02, ... is the same for both the process and the 
final process, we can recover the process from the final process. The final 
process was chosen as a function (in fact a factor) of the process. Hence 
they are isomorphic. By Corollary 16.21 the final process is a uniform mar- 
tingale. Thus we have proved that if you cross a nonperiondic process with 
a i.i.d. process which has the probability law, (1/2,1/4,1/8...) you get a 
process which is isomorphic to a uniform martingale on a countable state 
space. The Ornstein isomorphism theorem says that every two independent 
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processes with the same entropy are isomorphic even if that entropy is infin- 
ity (entropy for a Bernoulh process is impossible except in a trivial case). 
Any such nonzero entropy (including infinity) can be achieved as the en- 
tropy of a countable partition whose probabilities are a decreasing sequence 
of positive reals. Hence every nontrivial Bernoulli process is isomorphic to a 
i.i.d. process on a countable alphabet with probability law {a,b,c, ...) where 
a,b,c, ... is a decreasing sequence of positive real numbers and the only way 
we used the precise sequence (1/2, 1/4, 1/8...) is that i!! is much larger than 
the reciprocal of the i^^ term of that sequence so to handle the general case 
just carry out the exact same proof except use {a,b,c, ...) as your lookback 
probabilities and instead of ill use a function that grows much faster than 
the reciprocals of that sequence. □ 



7 Appendix 

Our goal is to establish that the Markov chain of Equation f l2^ has a unique 
stationary measure and that the resulting stationary process is Bernoulli. 
Keep in mind throughout that the Markov chain runs backwards in time. 

Definition 18. Throughout this section we assume the transition probabil- 
ities of the backwards Markov chain of Equation (|23|) . We will call these the 
blue transition probabilities. 

Definition 19. Let {X{i,j) : i,j integers} be the collection of 0,1 valued 
random variables for which we are trying to find their joint distribution. 

By the Caratheodory extension theorem it suffices to establish the joint 
distribution of X{i,j) : —n < i < n, —n < j < n for any n (assuming these 
joint distributions for different n are consistent with each other.) The way 
you get existence and uniqueness of a stationary measure for a finite state 
Markov chain is to prove the renewal theorem and from that it is trivial. Here 
we do essentially the same thing. We prove an analog for the renewal theorem 
with essentially the same proof as the proof of the renewal theorem and then 
existence and uniqueness of a stationary measure follows by essentially the 
same proof. 

Theorem 7.1. Analog of the Renewal theorem: 

Let {a{i,j) : —n < i < n,—n < j < n} be a specific square array of 
Os and Is (i.e. a{i,j) G {0,1} for — n < i < n,—n < j < n). Then for 
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every e > there exists an m such that for integers M > m and N > m and 
any two doubly infinite sequences of Os and Is A{i,M), — oo < i < oo, and 
B{i, N), —oo < i < oo, we have that \P{X{i,j) = a{i,j), —n < i < n, —n < 
j < n\X{i,M) = A{i,M), -oo < i < oo) - P{X{iJ) = a{i,j),-n < i < 
-n < i < n\X{i, N) = B{i, N), -oo < i < oo)\ < e 

For exactly the same reason as in the finite case, if we can prove Theo- 
rem 17.11 then there is a unique stationary measure for the Markov chain. 

Proof. We use one of the proofs that is used in the finite case. Let X{i,j) be 
the Markov chain with the blue transition probabilities given running from 
the M column back given 

X{i, M) = A{i, M) : -oo < z < oo 
and let Y{i,j) be the Markov chain with the blue transition probabilities 
from the column back given 

N) = B{i, N): -oo<i<oo 

The proof of Theorem 17.11 will be complete if so long as m is sufficiently 
large we can couple the processes X{i,j) and Y{i,j) so that with probability 
> 1- e, the two processes are identical for all 1 < i < n,l < j < n. 

Definition 20. / is the infinite to one map from the integers to the integers 
which we use to define the Markov chain. 

Definition 21. For every element of (a, b) G we say that [a + 1, f{b)) is 
the father of (a, b) 

Definition 22. Ancestor is the transitive closure of the relationship "father" . 
(e.g. if a is the father of b is the father of c is the father of d then a is an 
ancestor of d) 

Lemma 7.2. If a < b are integers and c is an integer then there is a unique 
d such that (6, d) is an ancestor of (a, c). 

Proof. This follows trivially from the fact that every (e, /) has exactly one 
father and that father has x coordinate e + 1. □ 

Conclusion of proof of Theorem I7.lt We now describe the coupling. As- 
sume WLOG that M < N. Recall that the Markov chain runs backwards 
(In the forwards direction each column determines the next.). Start with 
the condition Y{i,N) = B{i,N) : —oo < i < oo and run that process until 
the second coordinate reaches M. Then start running the process starting 
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X{i, M) = A{i, M) : -oo <i <oo. We start letting X(i, M), -oo < i < oo 
be independent of Y{i,M) — oo < i < oo. For a given {j,M) the two pro- 
cesses may or may not exhibit the same number (either both 1 or both 0). 
If they do then they have the same probabihty law for [k, M — 1) for any k 
where f{k) = j so we can couple them to be the same for {k, M — 1) for any 
such k. Otherwise for such a A; we couple them independently and that gives 
the probabihty that they end up the same at (/c, M — 1) to be 4/9. We couple 
each coordinate of the M — 1 column independently. Continue coupling in 
this manner. In other words if a given father is the same for both processes 
then his son is the same for both processes and if the fathers differ then 
the sons are the same with probability 4/9. We continue that coupling from 
then on making all couplings of a given column independent given the next 
column (keep in mind that the next column is actually the previous column 
from the standpoint of the backwards running Markov chain). For a given 

'■ —n < i < n, —n < j < n there is a unique k such that {k, M) is an 
ancestor of so if the two values on {k, M) are the same they will remain 
the same for the son, and hence for the grandson etc. and hence ultimately 
they will be the same for Even if they differ on {k, M) they have a 4/9 

chance that they will be the same on the son of {k, M) and then if they differ 
on the son they have a 4/9 chance of agreeing on the grandson etc. Once 
they agree they will continue to agree so since there are many generations 
between {M,k) and they will almost certainly agree on In fact 

even if they don't agree by time the second coordinate reaches m if m is large 
there is a lot of time for them to agree after that. Since this is true for all 

— n < i < n, —n < j < n it follows that they agree on that square 
with a probability bounded below by a function of m which goes to 1 as m 
approaches oo. □ 

Theorem 7.3. With the above Markov chain, the transformation which maps 
a configuration X{i,j) = a{i, j); —oo < i < oo, — oo < J < oo to X{i,j) = 
a{i — 1, j); — oo < i < oo, — oo < j < oo is Bernoulli. 

Idea of proof: Reading backwards this is essentially a mixing Markov 
chain and using the proof of the analog of the renewal theorem, the proof 
that it is Bernoulli is identical to the proof that a mixing Markov chain on a 
finite state space is Bernoulli. 

Proof. Ornstein showed the following two results [9] and [10] resp. and result 
3 is trivial directly from the definition of Bernoulli. 
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Ornstein Result 1: Let P„ be an increasing sequence of finite partitions 
which separate points and let T be a measure preserving transformation. If 
the T, Pn process is Bernoulh for all n then T is Bernoulli. 

Ornstein Result 2: Let P be a finite partition and T be a measure pre- 
serving transformation. If for every e > there is a coupling of two copies of 
the T, P process so that they are independent in the past and the expected 
mean hamming distance of the n future is less than e for sufficiently large n 
then the T, P process is Bernoulh. 

Result 3: The T, P process is Bernoulh iff the T~^, P process is Bernoulli, 
(this is trivial) 

From the above three lemmas, 
Letting 

W{k) be the value of X{i, j) : —n <i<n,k — n<j<k + n 
if 

we can couple the W process with another process referred to as the Y 

process whose distribution is identical to the W process so that W{k) : k >0 
is independent oiY{k):k>0 but for sufficiently large n, 
p(^{i -.-TKiKQ and W{{) 7^ ¥{{)} > en) < e. 

then 

The X process is Bernoulli and the proof will be complete. 

Extending the W process to the X process (on the whole two dimensional 
lattice) and extending the Y process to a process to be called the X' process 
with the same distribution as the X process, it suffices to get a coupling of 
the entire X{i,j) process with the entire X'{i,j) process such that 

1) X{i,j) j > —n is independent of X'{i,j) j > —n. 

2) For k sufficiently large P{W{-k) ^ Y{-k)) < 

The only significance of is that it is smaller than e^. 

But we are already done because we can just couple the two processes 
independently for j > —n and then couple them as in the proof of the analog 
to the renewal theorem for j < —n and the proof of that theorem establishes 
(2). □ 
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